247 research outputs found

    Parallel Implementations of Cellular Automata for Traffic Models

    Full text link
    The Biham-Middleton-Levine (BML) traffic model is a simple two-dimensional, discrete Cellular Automaton (CA) that has been used to study self-organization and phase transitions arising in traffic flows. From the computational point of view, the BML model exhibits the usual features of discrete CA, where the state of the automaton are updated according to simple rules that depend on the state of each cell and its neighbors. In this paper we study the impact of various optimizations for speeding up CA computations by using the BML model as a case study. In particular, we describe and analyze the impact of several parallel implementations that rely on CPU features, such as multiple cores or SIMD instructions, and on GPUs. Experimental evaluation provides quantitative measures of the payoff of each technique in terms of speedup with respect to a plain serial implementation. Our findings show that the performance gap between CPU and GPU implementations of the BML traffic model can be reduced by clever exploitation of all CPU features

    A Cascade Neural Network Architecture investigating Surface Plasmon Polaritons propagation for thin metals in OpenMP

    Full text link
    Surface plasmon polaritons (SPPs) confined along metal-dielectric interface have attracted a relevant interest in the area of ultracompact photonic circuits, photovoltaic devices and other applications due to their strong field confinement and enhancement. This paper investigates a novel cascade neural network (NN) architecture to find the dependance of metal thickness on the SPP propagation. Additionally, a novel training procedure for the proposed cascade NN has been developed using an OpenMP-based framework, thus greatly reducing training time. The performed experiments confirm the effectiveness of the proposed NN architecture for the problem at hand

    OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF

    Full text link
    The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.Comment: 28 page

    Verification of loop parallelisations

    Get PDF
    Writing correct parallel programs becomes more and more difficult as the complexity and heterogeneity of processors increase. This issue is addressed by parallelising compilers. Various compiler directives can be used to tell these compilers where to parallelise. This paper addresses the correctness of such compiler directives for loop parallelisation. Specifically, we propose a technique based on separation logic to verify whether a loop can be parallelised. Our approach requires each loop iteration to be specified with the locations that are read and written in this iteration. If the specifications are correct, they can be used to draw conclusions about loop (in)dependences. Moreover, they also reveal where synchronisation is needed in the parallelised program. The loop iteration specifications can be verified using permission-based separation logic and seamlessly integrate with functional behaviour specifications. We formally prove the correctness of our approach and we discuss automated tool support for our technique. Additionally, we also discuss how the loop iteration contracts can be compiled into specifications for the code coming out of the parallelising compiler

    Improving Scalability and Maintenance of Software for High-Performance Scientific Computing by Combining MDE and Frameworks

    Get PDF
    International audienceIn recent years, numerical simulation has attracted increasing interest within industry and among academics. Paradoxically, the development and maintenance of high performance scientific computing software has become more complex due to the diversification of hardware architectures and their related programming languages and libraries. In this paper, we share our experience in using model-driven development for numerical simulation software. Our approach called MDE4HPC proposes to tackle development complexity by using a domain specific modeling language to describe abstract views of the software. We present and analyse the results obtained with its implementation when deriving this abstract model to target Arcane, a development framework for 2D and 3D numerical simulation software

    Optimistic Parallelism on GPUs

    Full text link
    Abstract. We present speculative parallelization techniques that can exploit parallelism in loops even in the presence of dynamic irregulari-ties that may give rise to cross-iteration dependences. The execution of a speculatively parallelized loop consists of five phases: scheduling, com-putation, misspeculation check, result committing, and misspeculation recovery. While the first two phases enable exploitation of data paral-lelism, the latter three phases represent overhead costs of using specu-lation. We perform misspeculation check on the GPU to minimize its cost. We perform result committing and misspeculation recovery on the CPU to reduce the result copying and recovery overhead. The scheduling policies are designed to reduce the misspeculation rate. Our program-ming model provides API for programmers to give hints about potential misspeculations to reduce their detection cost. Our experiments yielded speedups of 3.62x-13.76x on an nVidia Tesla C1060 hosted in an Intel(R) Xeon(R) E5540 machine.

    Towards High Performance Relativistic Electronic Structure Modelling: The EXP-T Program Package

    Full text link
    Modern challenges arising in the fields of theoretical and experimental physics require new powerful tools for high-precision electronic structure modelling; one of the most perspective tools is the relativistic Fock space coupled cluster method (FS-RCC). Here we present a new extensible implementation of the FS-RCC method designed for modern parallel computers. The underlying theoretical model, algorithms and data structures are discussed. The performance and scaling features of the implementation are analyzed. The software developed allows to achieve a completely new level of accuracy for prediction of properties of atoms and molecules containing heavy and superheavy nuclei

    Determinación de la línea de la sonrisa en pacientes asistidos en Cátedra Prostodoncia IV B FOUNC

    Get PDF
    Fil: D'itria, José Antonio. Universidad Nacional de Córdoba. Facultad de Odontología. Cátedra de Prostodoncia B; Argentina.Fil: Elizondo Cassab, Elby Elizabeth. Universidad Nacional de Córdoba. Facultad de Odontología. Cátedra de Prostodoncia B; Argentina.Fil: Rugani, Nelson L. J. Universidad Nacional de Córdoba. Facultad de Odontología. Cátedra de Prostodoncia IV B; Argentina.Fil: Sánchez Dagum, Mercedes. Universidad Nacional de Córdoba. Facultad de Odontología. Cátedra de Odontología Preventiva y Comunitaria I; Argentina.Objetivo: 1-Analizar clínica y fotográficamente la línea de la sonrisa en pacientes parcialmente desdentados, 2-evaluar la relación entre la línea la de sonrisa y los factores sexo y edad. Metodología: Estudio observacional, descriptivo de corte transversal. Se evaluaron pacientes adultos entre 20-75 años, con al menos un elemento dentario anterior presente en boca, atendidos en la Cátedra de Prostodoncia IV B, marzo y noviembre 2015, n: 100, con consentimiento informado, por escrito, bajo normas CIEIS FO UNC. Se evaluaron las siguientes variables: sexo, edad y clasificación de la línea de la Sonrisa (Tijan y col). El registro fotográfico se realizó con cámara Canon D50, trípode fotográfico y con el paciente sentado de frente, ambos en posición predeterminada. El análisis descriptivo utilizará medidas de tendencia central (base en promedios, mediana, porcentajes y desv est. Para el análisis inferencial se utilizará análisis bivariado, prueba Chi2 y análisis de tendencia lineal de proporciones. Se considerará como significante p<0,05.Fil: D'itria, José Antonio. Universidad Nacional de Córdoba. Facultad de Odontología. Cátedra de Prostodoncia B; Argentina.Fil: Elizondo Cassab, Elby Elizabeth. Universidad Nacional de Córdoba. Facultad de Odontología. Cátedra de Prostodoncia B; Argentina.Fil: Rugani, Nelson L. J. Universidad Nacional de Córdoba. Facultad de Odontología. Cátedra de Prostodoncia IV B; Argentina.Fil: Sánchez Dagum, Mercedes. Universidad Nacional de Córdoba. Facultad de Odontología. Cátedra de Odontología Preventiva y Comunitaria I; Argentina.Otras Ciencias de la Salu

    The Network Analysis of Urban Streets: A Primal Approach

    Full text link
    The network metaphor in the analysis of urban and territorial cases has a long tradition especially in transportation/land-use planning and economic geography. More recently, urban design has brought its contribution by means of the "space syntax" methodology. All these approaches, though under different terms like accessibility, proximity, integration,connectivity, cost or effort, focus on the idea that some places (or streets) are more important than others because they are more central. The study of centrality in complex systems,however, originated in other scientific areas, namely in structural sociology, well before its use in urban studies; moreover, as a structural property of the system, centrality has never been extensively investigated metrically in geographic networks as it has been topologically in a wide range of other relational networks like social, biological or technological. After two previous works on some structural properties of the dual and primal graph representations of urban street networks (Porta et al. cond-mat/0411241; Crucitti et al. physics/0504163), in this paper we provide an in-depth investigation of centrality in the primal approach as compared to the dual one, with a special focus on potentials for urban design.Comment: 19 page, 4 figures. Paper related to the paper "The Network Analysis of Urban Streets: A Dual Approach" cond-mat/041124

    TomograPy: A Fast, Instrument-Independent, Solar Tomography Software

    Full text link
    Solar tomography has progressed rapidly in recent years thanks to the development of robust algorithms and the availability of more powerful computers. It can today provide crucial insights in solving issues related to the line-of-sight integration present in the data of solar imagers and coronagraphs. However, there remain challenges such as the increase of the available volume of data, the handling of the temporal evolution of the observed structures, and the heterogeneity of the data in multi-spacecraft studies. We present a generic software package that can perform fast tomographic inversions that scales linearly with the number of measurements, linearly with the length of the reconstruction cube (and not the number of voxels) and linearly with the number of cores and can use data from different sources and with a variety of physical models: TomograPy (http://nbarbey.github.com/TomograPy/), an open-source software freely available on the Python Package Index. For performance, TomograPy uses a parallelized-projection algorithm. It relies on the World Coordinate System standard to manage various data sources. A variety of inversion algorithms are provided to perform the tomographic-map estimation. A test suite is provided along with the code to ensure software quality. Since it makes use of the Siddon algorithm it is restricted to rectangular parallelepiped voxels but the spherical geometry of the corona can be handled through proper use of priors. We describe the main features of the code and show three practical examples of multi-spacecraft tomographic inversions using STEREO/EUVI and STEREO/COR1 data. Static and smoothly varying temporal evolution models are presented.Comment: 21 pages, 6 figures, 5 table
    • …
    corecore